Practical Lineage Tracing in Data Warehouses

نویسندگان

  • Yingwei Cui
  • Jennifer Widom
چکیده

We consider the view data lineage problem in a warehousing environment For a given data item in a materialized warehouse view we want to identify the set of source data items that produced the view item We formalize the problem and we present a lineage tracing algorithm for relational views with aggregation Based on our tracing algorithm we propose a number of schemes for storing auxiliary views that enable consistent and e cient lineage tracing in a multi source data warehouse We report on a performance study of the various schemes identifying which schemes perform best in which settings Based on our results we have implemented a lineage tracing package in the WHIPS data warehousing system prototype at Stanford With this package users can select view tuples of interest then e ciently drill through to examine the exact source tuples that produced the view tuples of interest

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Investigating a heterogeneous data integration approach for data warehousing

Data warehouses integrate data from remote, heterogeneous, autonomous data sources into a materialised central database. The heterogeneity of these data sources has two aspects, data expressed in different data models, called model heterogeneity, and data expressed within different schemas of the same data model, called schema heterogeneity. AutoMed is an approach to heterogeneous data transfor...

متن کامل

Research Problems in Data Provenance

The problem of tracing the provenance (also known as lineage) of data is an ubiquitous problem that is frequently encountered in databases that are the result of many transformation steps. Scientific databases and data warehouses are some examples of such databases. However, contributions from the database research community towards this problem have been somewhat limited. In this paper, we mot...

متن کامل

Lineage Tracing in a Data Warehousing System

A data warehousing system collects data from multiple distributed sources and stores the integrated information as materialized views in a local data warehouse. Users then perform data analysis and mining on the warehouse views. Figure 1 shows the basic architecture of a data warehousing system. In many cases, the warehouse view contents alone are not su cient for in-depth analysis. It is often...

متن کامل

Lineage Tracing in a Data Warehousing System Demonstration Proposal

A data warehousing system collects data from multiple distributed sources and stores the inte grated information as materialized views in a local data warehouse Users then perform data analysis and mining on the warehouse views Figure shows the basic architecture of a data warehousing system In many cases the warehouse view contents alone are not su cient for in depth analysis It is often usefu...

متن کامل

Data Lineage: A Survey

Lineage, or provenance, in its most general form describes where data came from, how it was derived, and how it was updated over time. Information management systems today exploit lineage in tasks ranging from data verification in curated databases [1] to confidence computation in probabilistic databases [10, 12]. Here, we formalize and categorize lineage, discuss a set of selected papers, and ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000